PCA-Based Incremental Extreme Learning Machine (PCA-IELM) for COVID-19 Patient Diagnosis Using Chest X-Ray Images

Novel coronavirus 2019 has created a pandemic and was first reported in December 2019. It has had very adverse consequences on people's daily life, healthcare, and the world's economy as well. According to the World Health Organization's most recent statistics, COVID-19 has become a worldwide pandemic, and the number of infected persons and fatalities growing at an alarming rate. It is highly required to have an effective system to early detect the COVID-19 patients to curb the further spreading of the virus from the affected person. Therefore, to early identify positive cases in patients and to support radiologists in the automatic diagnosis of COVID-19 from X-ray images, a novel method PCA-IELM is proposed based on principal component analysis (PCA) and incremental extreme learning machine. The suggested method's key addition is that it considers the benefits of PCA and the incremental extreme learning machine. Further, our strategy PCA-IELM reduces the input dimension by extracting the most important information from an image. Consequently, the technique can effectively increase the COVID-19 patient prediction performance. In addition to these, PCA-IELM has a faster training speed than a multi-layer neural network. The proposed approach was tested on a COVID-19 patient's chest X-ray image dataset. The experimental results indicate that the proposed approach PCA-IELM outperforms PCA-SVM and PCA-ELM in terms of accuracy (98.11%), precision (96.11%), recall (97.50%), F1-score (98.50%), etc., and training speed.


Introduction
e World Health Organization (WHO) identified COVID-19 (virus known as SARS-CoV-2) as a worldwide pandemic in February 2020. is triggered never expected countermeasures, such as the closure of cities, districts, and foreign travel. Coronaviruses (CoV) are death-defying viruses that may cause severe acute respiratory syndrome (SARS-CoV). Various researchers and institutions have attempted an effective solution from different possible diminutions in encountering the COVID-19 pandemic. Multimedia dataset (audio, picture, video, etc.) is booming in a massive amount of text information as civilization enters the information era. Image classification has become more essential as the need for real-world vision systems grows [1] and has recently attained a lot of attention from many researchers. It has evolved into one of the most essential operations, serving as a requirement for all other image processing operations. Image classification using learning algorithms is a special open issue in image processing that has sparked a lot of interest due to its promising applications. In general, an image categorization system has two primary processes. e first stage is to create an effective image representation that has enough information about the image to allow for classification further. e second step is to use a good classifier to classify the new image. us, there are two major challenges to consider when improving picture classification performance: dimensionality reduction and classifier. Apart from computer vision and image operation, one of the most important stages in image classification is feature extraction which determines the invariant characteristic of images when using computer devices to assess and deal with image data.
In a practical scenario, feature extraction has been applied in many fields like historic structures, medical image processing, remote image sensing, etc. e image's essential lower-level qualities include color, texture, and shape. e color feature has globality, which may be retrieved using tools such as the color histogram, color set, and color moment. It might simply explain the proportions of different colors across the image. e useful characteristic is color for identifying photos that are difficult to distinguish automatically, and the spatial variation should be ignored. However, it is unable to explain the image's local distribution as well as the description of the distinct colors' spatial positions. Image classification with feature extraction using incremental extreme learning machines is proposed in this paper. Firstly, on the COVID-19 dataset of chest X-ray images, features were extracted from an image using PCA. Eventually, the SVM, ELM, and IELM are applied to image classification [2] once the dimension is reduced by PCA method. Different metrics were employed to achieve the robust evaluation: classification accuracy, recall, precision, F-score, true-negative rate (TNR), true-positive rate (TPR), AUC, G-mean, precision-recall curve, and receiver operating characteristics (ROC) curve. e paper is arranged in the following sequence: several related approaches have been discussed in Section 2. e suggested technique is described and critiqued in Section 3. Section 4 contains a description of PCA and feature extraction techniques. Subsections 4.1-4.6 contain different algorithmic approaches that are compared with the proposed method. In Section 5, the proposed method and algorithm have been discussed. Section 6 describes the different evaluation criteria that are used. Section 7 discusses the experimental setup that has been used. Section 8 describes the dataset. Finally, Section 9 discusses the experimental results, and the research is concluded.

Related Works
e content of image features comprises color, texture, and other visual elements. e extracted content from visual features is the main component for analyzing the image. In this segment, some of the earlier work based on PCA and other feature extraction techniques along with different classification techniques has been discussed.
Sun et al. [3] suggested an image classification system based on multi-view depth characteristics and principal component analysis. In this method, depth features are extracted from the image, and from RGB depth, characters are independently extracted and PCA is applied to reduce dimension. e Scene15 dataset, Caltech256 dataset, and MIT Indoor datasets are used in the evaluation process. Eventually, the SVM [4] is used to classify images. e method's performance is demonstrated by the experimental results.
Mustaqeem and Saqib [5] suggested a hybrid method that is based on PCA and SVM. PROMISE (KC1: 2109 observations, CM1: 344 observations) data from NASA's directory have been used for the experiment. e dataset was divided into two parts: training (KC1: 1476 observations, CM1: 240 observations) and testing (KC1: 633 observations, CM1: 104 observations). Principal components of the features are extracted by PCA, and it helps in dimensionality reduction and minimizing time complexity.
In addition to this, SVM is used for further classification, and for hyperparameter tuning, GridSearchCV is used. From this, precision, recall, F-measure, and accuracy for KC1 dataset analysis are 86.8%, 99.6%, 92.8%, and 86.6%, respectively, and for CM1 dataset analysis, precision, recall, F-measure, and accuracy are 96.1%, 99.0%, 97.5%, and 95.2%, respectively. Similarly, Castaño et al. [6] provide a deterministic approach for starting ELM training based on hidden node parameters with activation function. e hidden node parameters are calculated with the help of Moore-Penrose generalized inverse, whereas the output node parameters are recovered through principal component analysis. Experimental validation with fifteen wellknown datasets was used to validate the algorithm. e Bonferroni-Dunn, Nemenyi, and Friedman tests were used to compare the results obtained. In comparison with later ELM advancements, this technique significantly reduces computing costs and outperforms them.
Zhao et al. [8] suggested extreme learning machines with no iteration along with supervised samples are used for model building as a class incremental extreme learning machine. e algorithm is shown to be stable and has almost equivalent accuracy of batch learning. Similarly, Huang and Chen [9] proposed an algorithm that analytically calculates hidden nodes' output after randomly producing and adding computational nodes to the hidden layer as a convex incremental extreme learning machine. Using a convex optimization, the existing hidden node output is calculated again. is can converge faster while maintaining efficiency and simplicity.
Zhu et al. [10] proposed a principal component analysis (PCA)-based categorization system with kernel-based extreme learning machine (KELM). Based on the resultant output, this model achieves better accuracy than SVM and other traditional classification methods. For the classification of HSIs, Kang et al. [11] developed the PCA-EPF extraction approach. In this research work, they have proposed 2 Computational Intelligence and Neuroscience the combination of PCA and standard edge preserving filtering (EPF)-based feature extraction. e proposed method achieves better classification accuracy with limited training samples. Similarly, Perales-González et al. [12] introduced a new ELM architecture based on the negative correlation learning framework dubbed negative correlation hidden layer ELM (NCHL-ELM). is model shows better accuracy when compared with other classifications by integrating a parameter into each node in the original ELM hidden layer. Based on fractal dimension technology, Li et al. [13] suggested an enhanced ELM algorithm (F-ELM). By reducing the dimension of the hidden layer, the model improves in training speed. From the experimental results, it can be concluded that as compared to the standard ELM technique, the suggested algorithm significantly reduces computing time while also improving inversion accuracy and algorithm stability.
Because of the complexity of the data models, deep learning is incredibly pricey to train. Furthermore, deep learning necessitates the use of high-priced GPUs and hundreds of computer machines. ere is no simple rule that can help you choose the best deep learning tools since it necessitates the understanding of topology, training technique, and other characteristics, whereas the simple ELM is a one-shot computation with a rapid learning pace. But the biggest advantage in IELM is the ability to randomly increase hidden nodes incrementally and analytically fix the output weights. e output error of the IELM rapidly diminishes as the number of hidden neurons increases.

Proposed Methodology
e back propagation (BP) approach is commonly used to train multi-layer perceptron (MLP). Various algorithms can be used to train this typical architecture. Gradients and heuristics are two types of algorithms that are commonly used. ese algorithms have a few things in common: they have a hard time dealing with enormous amounts of data, and they have a slow convergence rate in these situations. Huang et al. (Huang et al.) [15] introduced the extreme learning machine as a solution to this problem. e typical computing time required to train an SLFN using gradient-based techniques is reduced by this algorithm. e ELM, on the other hand, has several flaws. e randomly generated input weights and bias for ELM [16] result in some network instability. In case if there are outliners in the training data, then the hidden layer's output matrix will have ill-conditioned problems and it results in low generalization performance and lower forecasting accuracy. ere are two types of ELM called fixed ELM and IELM [17]. In comparison with the ELM, the output error of the IELM rapidly diminishes and it tends toward zero with the growth in number of hidden neurons (Huang et al.) [15]. In online continuous learning regression and classification problem, this approach is very prominent (Xu and Wang; Zhang et al.) [18,19].
A trained classifier can be obtained after training the classifiers with a sufficient amount of image data and then fed into the trained classifier for observation and analysis.

Feature Extraction
A single feature cannot describe the image feature and quality properly. e image classification will not yield acceptable results unless distinguishing features are described. ree images corresponding to three viewpoints are placed on each RGB color image. Our method uses PCA to extract the image's important information and minimize the input dimension [20][21][22][23].

Classification of Images and PCA Feature Extraction.
Extracting useful features from an image is a prominent task in image classification, and principal component analysis (PCA) is used for this purpose. PCA uses orthogonal transformation and converts variables to fewer independent components than the original variables. e output data with this approach will not lose important data features, and PCA loadings can be used for the identification of important data. A multivariate statistical analysis approach is used by PCA, which can perform linear transformation of numerous variables to pick a few key variables. PCA transforms data using eigenvectors from N-dimension to M-dimension where M < N.
e new features are a linear mixture of the old ones, allowing them to capture the data's intrinsic unpredictability with little information loss. Figure 1 reveals the steps of the proposed model.
Suppose that the research object has p indexes, these indexes are regarded as p random variables and represented as X 1 , X 2 , , X p . With this, new indexes are created by combining p random variable F 1 , F 2 , ..., F p , which can mirror the data from the original indexes [24]. e independent replacement indexes reflect the original indexes' essential information.
(1) e following are the PCA stages in detail: (1) Data standardization: e following calculation formula is used to standardize the matrix X: Computational Intelligence and Neuroscience (2) e following formula is used to solve the correlation coefficient matrix R: (3) e following formula is used to calculate the eigenvalue and eigenvector of the coefficient matrix: e calculated eigenvector is a i � (a i1 , a i2 , ... , a ip ), where i � 1, 2, 3, 4, . . .. . .. . .. . .., p, and the eigenvalue is λ i (i � 1, 2, ..., p). To get a collection of main components Fi, the eigenvalues are sorted in descending order: (4) e following are the main factors to consider kth primary component contribution rate and expressed as    Computational Intelligence and Neuroscience e rate of the first k primary components' cumulative contribution is expressed as e first principal component, F 1 , is the one with the highest variance out of all the combinations of Y 1 , Y 2 , ..., Y p ; the second principal component F 2 is one with the highest variance among all the combinations of Y 1 , Y 2 , ..., Y p , and they have no relation with F 1 .

SVM.
Several algorithms have been implemented and suggested in machine learning to solve the classification problem. Among the different classification problems, support vector machine (SVM) is one of the supervised algorithms in machine learning with [5,25] the advantages as follows: (i) It employs L2 regularization to overcome overfitting problems. (ii) Even with minimal data, provide suitable findings. (iii) Different kernel functions to match the features' complicated functions and interactions. (iv) Manages the data nonlinearity.
(v) e model is stable thanks to the hyper-plane splitting rule. (vi) Analyzes the data with a high degree of dimensionality.
Instead of focusing on decreasing prediction error, SVM focuses more on optimizing classification decision boundaries, which is why the hyper-plane is used to separate classes. If the data dimension is n and the hyper-plane is a (n − 1) vector function, then it can be represented mathematically as follows: It also signifies, in a broader sense, where x denotes the input feature vector, w is the weight vector, and b is the bias. By adjusting w and b, several hyper-planes can be created, but the hyper-plane with the best margin will be chosen. e largest feasible perpendicular distance between each class and the hyper-plane is defined as ideal margin. e cost function or objective function is minimized to get the best margin. e cost function may be written as follows: Even if the predictions are right and the data are correctly categorized by hypothesis, SMV utilized to penalize any y i that are close to the borders (0 < y i < 1). e main goal is to figure out optimal w value to minimize J(w), so differentiating Eq. 11 concerning w, we get the gradient of a cost function as follows: As far as we have calculated ∇ w J(w), weights of w can be updated as We go through the procedure again and again until smallest J(w) discovered. Because data are rarely linearly separable, we must sketch a decision boundary between the classes rather than using a hyper-plane to separate them. We will need to convert (13) into a decision boundary to deal with the dataset's nonlinearity: ϕ(x) is the kernel function in (14). ere are various types of kernel functions that may be used to create SVM, such as linear, polynomial, and exponential, but we will use the radial basis function in this model (RBF). Distance parameter that is used is Euclidean distance, and the smoothness of the borders is defined by the parameter σ.
where x − x 2 is the square of Euclidean distance between any single observation x and mean of the training sample x.

PCA-SVM.
e motive of the support vector machine (SVM) [3] is to find the best possible hyper-plane that will separate two planes on the training set. e coefficient of the hyper-plane is w that we have to project. It uses structural risk minimization theory to build the best hyper-plane segmentation in the feature space and a learning editor to achieve global optimization.
Assume the training data, (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n ) ∈ R n , y ∈ −1, 1 { }. is could be projected into a hyper-plane: For the normalization, e classification of the interval is equal to 2/ω, when the maximum interval is equal to the minimum ω 2 .
Before classifying the data through SVM, the necessary features from the image data need to be extracted. e highdimensional data can be converted to the low-dimensional data with this approach. For this, the PCA method as a feature extraction through convergence matrix and eigenvalue proportion calculation is used. PCA-based SVM is the method that is used for classification and regression. After that, SVM is used to classify low-dimensional data. Figure 2 Computational Intelligence and Neuroscience depicts the working flow of PCA-SVM. Once the parameter optimization is done, the model is ready to predict categorization.

Extreme Learning Machine (ELM).
An extreme learning machine is a single hidden layer feedforward network that can be used for both classification and regression. In ELM [26], weights between the input layer, hidden layer, and biases are randomly generated. e output weights are calculated using the generalized Moore-Penrose pseudoinverse. ELM performs faster than other feedforward networks [27] and outperforms other iterative methods.  N and x Let the number of input features and number of neurons be equal and represented by m; similarly, let L be the number of hidden neurons. e number of output neurons and number of classes are equal and denoted by c. Figure 4 [24] shows the flowchart of the principal component analysis [28]. e input weight matrix is represented by U � [u 1 , u 2 ,. . ., u j ,. . .u L ] T ∈ R L×m , and the hidden neuron bias is represented by . .u jm ] are the connecting weights between the j th hidden neuron with the input neurons. Bias of the j th hidden neuron is bj, and jth hidden layer output for i th instance is represented by Here, activation function is represented by g. For all the training instances hidden layer output is represented by H and can be represented by Between the hidden layer and the output layer, the output weight β can be computed using Eq. (20). Linear activation function is used by the output layer in this computation.
Here, Input: Given N observations along with the class labels (x i , t i ), x i ∈ R m , t i ∈ R C . Output: SVM classification model.
Here, the output function is f(x) � [f k (x),. . ., f C (x)]. From Eq. 23, label for class x can be predicted.

PCA-ELM : Classification Method Based on PCA-ELM.
In the PCA technique [6], variables are first scaled. e different steps of PCA that has been applied in PCA-ELM are e output from PCA is given as an input to ELM [29]. e process of PCA-ELM [30] is shown in Figure 5.
Input: Given N observations along with the class labels (x i , t i ), x i ∈ R m , ∈ R C . Output: PCA-SVM model for classification.
(1) Procedure PCA-SVM (2) Identify the relationship among features through a covariance matrix.
(3) rough the linear transformation or eigendecomposition of the covariance matrix, we get eigenvectors and eigenvalues. (4) Transform our data using eigenvectors into principal components. (5) Quantify the importance of these relationships using eigenvalues and keep the important principal components. (6) Data extracted from PCA and will be given as input.   Computational Intelligence and Neuroscience 4.6. ELM. Compared to the other neural networks, the ELM learns faster as there is no need to adjust hidden nodes and provides better generalization capability. But there are various flaws with the ELM. Randomly generated bias and input weights in ELM network [31] are results in some network instability. Training data outliers from the hidden layer's output matrix result in poor network generalization performance. In comparison to the ELM, the output error of the IELM rapidly diminishes and resolves the issue of very small weights of output and validity of hidden layer neurons. In online continuous learning, it is appropriate for regression and classification tasks. e IELM [32] network model structure is shown in Figure 6. Suppose the size of input, hidden nodes, and outputs are m, l, and n, respectively, and ω i is the input weight matrix with l × m dimension of the current hidden layer neuron and uniformly distributed between random numbers [−1, 1]. e bias of the ith hidden node b i is a random number between [−1, 1] uniformly distributed, the activation function for the hidden layer neuron is sigmoid function given by (24), and output weight matrix β is with l × n dimension. e hidden node activation function (sigmoid) is given by where x is the input matrix.
Input: Given N observations along with the class labels (x i , t i ), x i ∈ R m , ∈ R C . Output: ELM classification model. (1) ELM procedure.
(2) Hidden biases b and input weights U randomly selected.
(3) From (19), H as hidden layer output is defined. (4) From (20), β is defined as the weights between the hidden layer and the output layer.

Computational Intelligence and Neuroscience
A matrix X is of m × N dimension, and it represents N dataset input. Y is a n × N matrix that represents the output where N datasets for a training set {(X, Y)}. Training steps of IELM algorithm are described as follows: Step 1. In the initialization phase, suppose l � 0 and L is the maximum number of the hidden nodes. Output Y is defined in terms of the initial value of the residuals E (difference between target and actual error) is set to be the and ε is the expected training accuracy.
Step 2. Training phase, while l < L and E > ε (1) Hidden nodes l will be increased by 1, i.e., (2) Hidden layer neuron O l is evaluated randomly from input weights ω l and bias b l . (3) Output of the activation function g (x ′ ) is calculated for the node O l (b l needs to be extended into a l × N vector b l ).
(4) Hidden layer neuron output vector H can be calculated from (6) After increasing the new hidden node, residual error is calculated: e network error rate can be reduced by the output weight O l . All these steps will iteratively work till the residual error becomes smaller than ε. e training process restarts through the determination of the random input weight ω l and the bias b l . Whether the trained network has fulfilled the desirable result or not can be determined from (X ′ , Y ′ ) set.

Proposed PCA-Based Incremental ELM (PCA-IELM)
An orthogonal transformation is used to extract meaningful characteristics from data in PCA [33]. PCA may also be used to minimize the dimensions of a large data collection. Principal components from COVID-19 X-ray images are extracted using PCA and given as input to IELM which gradually adds concealed nodes produced at random. A conventional SLFNs function with n hidden nodes can be expressed as where g i (x) � g(a i , b i , x) denotes the output of the ith hidden node:g i (x) � g(a i .x + b i ) (for additive nodes) or e ith hidden layer and the output node are linked with output weights β i . Hidden nodes are randomly added to the existing networks in IELM. e randomly generated hidden node parameters a i and b i and fixed output weight are β i .
Suppose the residual error function for the current network f n is defined as e n � f − f n . where n is the number of hidden nodes and f ∈ L 2 (x) is the target function. IELM is mathematically represented as

Evaluation Criteria for Effective Measure of Model
For evaluation of the different models, generally, the confusion matrix is prepared. Table 2 defines a simple representation of the confusion matrix [34,35], and it can classify between predicted and actual values. From the confusion matrix, we can derive different performance metrics, e.g., accuracy, precision, recall, sensitivity, and F-score. To assess the model, nine different metrics are calculated by formula as given in Table 3 [36].

Experimental Setup
e whole experiment was performed on a system having a configuration of 10th Generation Intel (R) Core (TM) i7-10750H CPU @ 2.60 GHz processor, 8 GB RAM, and NVIDIA GTX graphics 1650TI. e code is written in Python 3.10.0 and uses Jupyter Notebook as a debugger, which can be installed from the link: https://jupyter.org/ install.

Dataset Description
e COVID-19 chest X-ray images [37] dataset encompasses a total of 13808 images in which 3616 COVID-19 positive cases (26.2%) along with 10,192 (73.8%) normal cases are downloaded from Kaggle. COVID-19 and normal patient chest X-ray images are kept in separate files. Dataset was divided into training and testing images which had been done randomly with a condition that testing images will not be repeated in training images. During the experiment, 80% of the total images were used for training and 20% for testing. All images have the same dimension (299 × 299) pixels in the PNG file format. Figure 7 demonstrates the X-ray images of normal and COVID-19 cases. e histogram of an image gives a global description of the image's appearance. It represents the relative frequency of occurrences of various intensity values in an image. In the histogram of the COVID-19 image, the intensity value is highest between bins 14-15, whereas in the normal image the histogram has the highest intensity value at bins 16-17.
Computational Intelligence and Neuroscience is difference in the color intensity value assists in making the distinction between COVID-19 and normal images. Figure 8 demonstrates the histogram plot of normal and COVID-19 images. Figure 9 shows the training images for X-ray images of COVID-19 and normal.
Because PCA uses orthogonal transformation to convert all features into a few independent features, all features are considered during the feature selection process. e data to be processed are reduced to a set of features called a "reduced representation set." Input: Given N observations along with the class labels (x i , t i ), x i ∈ R m , t i ∈ R C . Output: IELM model for classification.
(3) For the newly increased hidden layer neuron O l , input weights ω l and bias b l randomly evaluated.
(4) Output of the activation function g (x ′ ) calculated for the node O l .

Results and Discussion
In this segment, we present the outcomes and analysis of the experiments performed in the COVID-19 patient prediction using the chest X-ray dataset. From the experimental results, the proposed method shows better performance in terms of accuracy, precision, recall, F1score, AUC, G-mean, and other parameters. For each model, PCA-SVM, PCA-ELM, and PCA-IELM, a separate confusion matrix is formed. All the performance metrics values are derived from the confusion matrix (Tables 4-6). Classification accuracy gained by the proposed method PCA-IELM is 98.11% over the chest X-ray dataset, which suggests better results than the other two models, PCAbased SVM (91.8%) and PCA-based ELM (93.80%) in terms of accuracy. Sometimes, performance metrics' accuracy may be misleading and can misclassify instances. So, other metrics are also taken into consideration to confirm the     Figure 10) recall, F 1score, TPR, TNR, and G-mean are considerably higher than the other two methods, PCA-SVM and PCA-ELM. e geometric mean (G-mean) is a statistic that analyzes categorization performance across majority and minority classes. Even if negative examples are correctly labelled as such, a poor G-mean suggests weak performance in identifying positive occurrences.
is statistic is essential for preventing overfitting the negative class while underfitting the positive class, since the COVID-19 dataset understudy is also class imbalanced (IR � 2.81). Even then, the PCA-ELM model indicates good performance by attaining the highest G-mean value of 98%. Similarly, PCA-SVM and PCA-ELM have 88% and 90.5% success rates, respectively. Table 7 demonstrates the performance variation (sensitivity, specificity, precision, F1-score, accuracy) based on different counts of hidden nodes in the range of 10-150 with an interval of 10 hidden nodes. Training and testing accuracies of PCA-IELM demonstrated almost the same behavior on the COVID-19 dataset (refer to Figure 11).
ere is moderate variation in the accuracy of PCA-IELM with respect to different numbers of hidden nodes. e accuracy at 10 numbers of hidden nodes was found to be 97.73%, and 98.11% was achieved at 140 numbers of hidden nodes in the PCA-IELM model and beyond (refer to Table 7).
When there is a moderate to large class imbalance, precision-recall curves should be drawn. Here, the COVID-19 dataset is imbalanced with an imbalance ratio (IR) of 2.81. It is worth noticing that precision is also called the positive       predictive value (PPV). Moreover, recall is also known as sensitivity, hit rate, or true-positive rate (TPR). It means they talk about positive cases and not negative ones. Most machine learning algorithms often involve a trade-off between recall and precision. A good PR curve has a greater AUC (area under curve). Figures 12(b), 13(b), and 14(b) depict PR curves. Figure 13(b) shows the greater AUC, which is an indication of the better performance of PCA-IELM than the other two models. In addition to these, ROC of Figure 14(a) also grabs more AUC than two other Figures 12(a) and 13(a). erefore, PCA-IELM claims better performance than PCA-SVM and PCA-IELM. e proposed PCA-IELM model outperforms other previously developed models for identification of COVID-19 patients from chest X-ray image (refer Table 8 [38][39][40][41][42][43][44][45][46][47]). As far as the training and testing time taken by the proposed model PCA-IELM is concerned, it was higher (refer to Table 9) because the execution of the model happened in an incremental way and not in one go.

Conclusions
In this paper, an effective classification model is proposed on the COVID-19 chest X-ray image dataset using principal component analysis (PCA) and incremental extreme learning machine (IELM). is study established the valuable application of the ELM model to classify COVID-19 patients from X-ray images by developing the PCA-IELM model. e proposed PCA-based IELM algorithm is an efficient IELM-based algorithm. e hidden node parameters are measured by the information returned to the PCA in the training dataset, and using the Moore-Penrose generalized inverse output, the node parameters are determined. PCA-IELM utilizes the best feature of IELM, which is to increase hidden nodes incrementally and wisely determine the output weights, whereas ELM requires you to set the appropriate number of hidden nodes manually, and this is similar to the hit and trial method. In comparison with the ELM, the output error of the IELM rapidly reduces and is near to zero as the number of hidden neurons increases. It was observed that as the number of hidden nodes increased, the performance of the PCA-IELM increased and it became stable at 150 hidden nodes. PCA-IELM outperforms PCA-SVM and PCA-ELM in terms of accuracy (98.11%), precision (96.11%), recall (97.50%), F1-score (98.50%), G-mean (98%), etc. e suggested research contributes to the prospect of a low-cost, quick, and automated diagnosis of the COVID-19 patient, and it may be used in clinical scenarios.
is effective system can provide early detection of COVID-19 patients. As a result, it is helpful in controlling the further spread of the virus from an affected person. is is an intelligent assistance for radiologists to accurately diagnose COVID-19 in X-ray images.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.